5 research outputs found

    Learned Semantic Multi-Sensor Depth Map Fusion

    Full text link
    Volumetric depth map fusion based on truncated signed distance functions has become a standard method and is used in many 3D reconstruction pipelines. In this paper, we are generalizing this classic method in multiple ways: 1) Semantics: Semantic information enriches the scene representation and is incorporated into the fusion process. 2) Multi-Sensor: Depth information can originate from different sensors or algorithms with very different noise and outlier statistics which are considered during data fusion. 3) Scene denoising and completion: Sensors can fail to recover depth for certain materials and light conditions, or data is missing due to occlusions. Our method denoises the geometry, closes holes and computes a watertight surface for every semantic class. 4) Learning: We propose a neural network reconstruction method that unifies all these properties within a single powerful framework. Our method learns sensor or algorithm properties jointly with semantic depth fusion and scene completion and can also be used as an expert system, e.g. to unify the strengths of various photometric stereo algorithms. Our approach is the first to unify all these properties. Experimental evaluations on both synthetic and real data sets demonstrate clear improvements.Comment: 11 pages, 7 figures, 2 tables, accepted for the 2nd Workshop on 3D Reconstruction in the Wild (3DRW2019) in conjunction with ICCV201

    Learned Multi-View Texture Super-Resolution

    Full text link
    We present a super-resolution method capable of creating a high-resolution texture map for a virtual 3D object from a set of lower-resolution images of that object. Our architecture unifies the concepts of (i) multi-view super-resolution based on the redundancy of overlapping views and (ii) single-view super-resolution based on a learned prior of high-resolution (HR) image structure. The principle of multi-view super-resolution is to invert the image formation process and recover the latent HR texture from multiple lower-resolution projections. We map that inverse problem into a block of suitably designed neural network layers, and combine it with a standard encoder-decoder network for learned single-image super-resolution. Wiring the image formation model into the network avoids having to learn perspective mapping from textures to images, and elegantly handles a varying number of input views. Experiments demonstrate that the combination of multi-view observations and learned prior yields improved texture maps.Comment: 11 pages, 5 figures, 2019 International Conference on 3D Vision (3DV

    Towards Large Scale Dense Semantic 3D Reconstruction

    No full text
    With the increasing digitisation of various industries requiring digital twins for virtual interactions or simulations, the need for methods that generate 3D models is more pressing than ever. In particular, 3D reconstruction from images can play a significant part in the solution. When performed together with semantic segmentation, we obtain 3D models that display information about the nature of the scene components, which is valuable for applications in mixed reality and robotics. Moreover, the combined effects of 3D and semantics contribute to improving their quality by enabling reconstruction in areas poorly supported by data through shape priors. One state-of-the-art method considers the problem as a voxel labelling one, which can be solved using convex optimisation. Spectacular results are obtained with high-quality geometry per semantic category, at the cost of limited performances in terms of memory consumption. Hence, state-of-the-art methods are poorly scalable, limiting the size and resolution of the scenes they process as well as the number of semantic categories they can consider. We present several contributions to tackle this challenge. The first consists of a novel data structure dividing the scene into voxel blocks which allow for the presence of a subset of all the semantic categories. However, it necessitates hand-crafted shape priors, and designing them becomes intractable as more categories come into play. We thus introduce a lightweight neural network based on variational inference theory, which takes charge of learning these priors. We then extend it to an octree-based sparse volumetric scene representation bridging the gap between semantic and spatial scalability problems. Moreover, we propose a sensor-adaptive module, which allows our network to process scenes scanned from data degraded by heterogeneous noise models. We conclude with preliminary work on a new method that encodes the scene into latent orthographic space that can be built incrementally from aerial views and decoded into a semantic 3D scene. All along, experiments on indoor, outdoor, real and synthetic datasets with up to 40 categories show that our contributions achieved significant improvements in terms of memory footprint and producing accurate reconstructions without requiring large amounts of training data
    corecore